NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Relational Diagrams and the Pattern Expressiveness of Relational Languages

https://doi.org/10.1145/3733620.3733637

Gatterbauer, Wolfgang; Dunne, Cody (April 2025, ACM SIGMOD Record)

Comparing relational languages by their logical expressiveness is well understood. Less understood is how to compare relational languages by their ability to represent relational query patterns. Indeed, what are query patterns other than ''a certain way of writing a query''? And how can query patterns be defined across procedural and declarative languages, irrespective of their syntax? Our SIGMOD 2024 paper proposes a semantic definition of relational query patterns that uses a variant of structure-preserving mappings between the relational tables of queries. This formalism allows us to analyze the relative pattern expressiveness of relational languages. Notably, for the nondisjunctive language fragment, we show that relational calculus (RC) can express a larger class of patterns than the basic operators of relational algebra (RA). We also propose Relational Diagrams, a complete and sound diagrammatic representation of safe relational calculus. These diagrams can represent all query patterns for unions of non-disjunctive queries, in contrast to visual query representations that derive visual marks from the basic operators of algebra. Our anonymously preregistered user study shows that Relational Diagrams allow users to recognize relational patterns meaningfully faster and more accurately than they can with SQL.
more » « less
Free, publicly-accessible full text available April 28, 2026
Reproducibility Report for ACM SIGMOD 2024 Paper: 'On The Reasonable Effectiveness of Relational Diagrams'

https://doi.org/10.1145/3687998.3717044

Vinciguerra, Giorgio; Yang, Guang; Gatterbauer, Wolfgang; Dunne, Cody (March 2025, ACM)

Free, publicly-accessible full text available March 21, 2026
Ranked Enumeration for Database Queries

https://doi.org/10.1145/3703922.3703924

Tziavelis, Nikolaos; Gatterbauer, Wolfgang; Riedewald, Mirek (November 2024, ACM SIGMOD Record)

Ranked enumeration is a query-answering paradigm where the query answers are returned incrementally in order of importance (instead of returning all answers at once). Importance is defined by a ranking function that can be specific to the application, but typically involves either a lexicographic order (e.g., ORDER BY R.A, S.B in SQL) or a weighted sum of attributes (e.g., ORDER BY 3*R.A + 2*S.B). Recent work has introduced any-k algorithms for (multi-way) join queries, which push ranking into joins and avoid materializing intermediate results until necessary. The top-ranked answers are returned asymptotically faster than the common join-then-rank approach of database systems, resulting in orders-of-magnitude speedup in practice.
more » « less
Full Text Available
A Comprehensive Tutorial on Over 100 Years of Diagrammatic Representations of Logical Statements and Relational Queries

https://doi.org/10.1109/ICDE60146.2024.00407

Gatterbauer, Wolfgang (May 2024, IEEE)

Full Text Available
Minimally Factorizing the Provenance of Self-join Free Conjunctive Queries

https://doi.org/10.1145/3651605

Makhija, Neha; Gatterbauer, Wolfgang (May 2024, Proceedings of the ACM on Management of Data)

We consider the problem of finding the minimal-size factorization of the provenance of self-join-free conjunctive queries, i.e.,we want to find a formula that minimizes the number of variable repetitions. This problem is equivalent to solving the fundamental Boolean formula factorization problem for the restricted setting of the provenance formulas of self-join free queries. While general Boolean formula minimization is Σ^p₂-complete, we show that the problem is NP-Complete in our case. Additionally, we identify a large class of queries that can be solved in PTIME, expanding beyond the previously known tractable cases of read-once formulas and hierarchical queries. We describe connections between factorizations, Variable Elimination Orders (VEOs), and minimal query plans. We leverage these insights to create an Integer Linear Program (ILP) that can solve the minimal factorization problem exactly. We also propose a Max-Flow Min-Cut (MFMC) based algorithm that gives an efficient approximate solution. Importantly, we show that both the Linear Programming (LP) relaxation of our ILP, and our MFMC-based algorithm are always correct for all currently known PTIME cases. Thus, we present two unified algorithms (ILP and MFMC) that can both recover all known PTIME cases in PTIME, yet also solve NP-Complete cases either exactly (ILP) or approximately (MFMC), as desired.
more » « less
Full Text Available
HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property

https://doi.org/10.1109/ICDE60146.2024.00025

Chen, Zixuan; Mitra, Subhodeep; Ravi, R; Gatterbauer, Wolfgang (May 2024, IEEE)

Full Text Available
On The Reasonable Effectiveness of Relational Diagrams: Explaining Relational Query Patterns and the Pattern Expressiveness of Relational Languages

https://doi.org/10.1145/3639316

Gatterbauer, Wolfgang; Dunne, Cody (March 2024, Proceedings of the ACM on Management of Data)

Comparing relational languages by their logical expressiveness is well understood. Less well understood is how to compare relational languages by their ability to represent relational query patterns. Indeed, what are query patterns other than a certain way of writing a query? And how can query patterns be defined across procedural and declarative languages, irrespective of their syntax? To the best of our knowledge, we provide the first semantic definition of relational query patterns by using a variant of structure-preserving mappings between the relational tables of queries. This formalism allows us to analyze the relative pattern expressiveness of relational language fragments and create a hierarchy of languages with equal logical expressiveness yet different pattern expressiveness. Notably, for the non-disjunctive language fragment, we show that relational calculus can express a larger class of patterns than the basic operators of relational algebra. Our language-independent definition of query patterns opens novel paths for assisting database users. For example, these patterns could be leveraged to create visual query representations that faithfully represent query patterns, speed up interpretation, and provide visual feedback during query editing. As a concrete example, we propose Relational Diagrams, a complete and sound diagrammatic representation of safe relational calculus that is provably (i) unambiguous, (ii) relationally complete, and (iii) able to represent all query patterns for unions of non-disjunctive queries. Among all diagrammatic representations for relational queries that we are aware of, ours is the only one with these three properties. Furthermore, our anonymously preregistered user study shows that Relational Diagrams allow users to recognize patterns meaningfully faster and more accurately than SQL.
more » « less
Full Text Available
A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

https://doi.org/10.1145/3626715

Makhija, Neha; Gatterbauer, Wolfgang (December 2023, Proceedings of the ACM on Management of Data)

What is a minimal set of tuples to delete from a database in order to eliminate all query answers? This problem is called the resilience of a query and is one of the key algorithmic problems underlying various forms of reverse data management, such as view maintenance, deletion propagation and causal responsibility. A long-open question is determining the conjunctive queries (CQs) for which resilience can be solved in PTIME. We shed new light on this problem by proposing a unified Integer Linear Programming (ILP) formulation. It is unified in that it can solve both previously studied restrictions (e.g., self-join-free CQs under set semantics that allow a PTIME solution) and new cases (all CQs under set or bag semantics). It is also unified in that all queries and all database instances are treated with the same approach, yet the algorithm is guaranteed to terminate in PTIME for all known PTIME cases. In particular, we prove that for all known easy cases, the optimal solution to our ILP is identical to a simpler Linear Programming (LP) relaxation, which implies that standard ILP solvers return the optimal solution to the original ILP in PTIME. Our approach allows us to explore new variants and obtain new complexity results. 1) It works under bag semantics, for which we give the first dichotomy results in the problem space. 2) We extend our approach to the related problem of causal responsibility and give a more fine-grained analysis of its complexity. 3) We recover easy instances for generally hard queries, including instances with read-once provenance and instances that become easy because of Functional Dependencies in the data. 4) We solve an open conjecture about a unified hardness criterion from PODS 2020 and prove the hardness of several queries of previously unknown complexity. 5) Experiments confirm that our findings accurately predict the asymptotic running times, and that our universal ILP is at times even quicker than a previously proposed dedicated flow algorithm.
more » « less
Full Text Available
A Tutorial on Visual Representations of Relational Queries

https://doi.org/10.14778/3611540.3611578

Gatterbauer, Wolfgang (August 2023, Proceedings of the VLDB Endowment)

Query formulation is increasingly performed by systems that need to guess a user's intent (e.g. via spoken word interfaces). But how can a user know that the computational agent is returning answers to the right query? More generally, given that relational queries can become pretty complicated,how can we help users understand existing relational queries, whether human-generated or automatically generated? Now seems the right moment to revisit a topic that predates the birth of the relational model: developing visual metaphors that help users understand relational queries. This lecture-style tutorial surveys the keyvisual metaphors developed for visual representations of relational expressions.We will survey the history and state-of-the art of relationally-complete diagrammatic representations of relational queries, discuss the key visual metaphors developed in over a century of investigating diagrammatic languages, and organize the landscape by mapping their used visual alphabets to the syntax and semantics of Relational Algebra (RA) and Relational Calculus (RC).
more » « less
Full Text Available
DomainNet: Homograph Detection and Understanding in Data Lake Disambiguation

https://doi.org/10.1145/3612919

Leventidis, Aristotelis; Di_Rocco, Laura; Gatterbauer, Wolfgang; Miller, Renée J; Riedewald, Mirek (September 2023, ACM Transactions on Database Systems)

Modern data lakes are heterogeneous in the vocabulary that is used to describe data. We study a problem of disambiguation in data lakes:How can we determine if a data value occurring more than once in the lake has different meanings and is therefore a homograph?While word and entity disambiguation have been well studied in computational linguistics, data management, and data science, we show that data lakes provide a new opportunity for disambiguation of data values, because tables implicitly define a massive network of interconnected values. We introduceDomainNet, which efficiently represents this network, and investigate to what extent it can be used to disambiguate values without requiring any supervision. DomainNetleverages network-centrality measures on a bipartite graph whose nodes represent data values and attributes to determine if a value is a homograph. A thorough experimental evaluation demonstrates that state-of-the-art techniques in domain discovery cannot be re-purposed to compete with our method. Specifically, using a domain discovery method to identify homographs achieves an F1-score of 0.38 versus 0.69 forDomainNet, which separates homographs well from data values that have a unique meaning. On a real data lake, our top-100 precision is 93%. Given a homograph, we also present a novel method for determining the number of meanings of the homograph and for assigning its data lake attributes to a meaning. We show the influence of homographs on two downstream tasks: entity-matching and domain discovery.
more » « less
Full Text Available

« Prev Next »

Search for: All records